Last updated: 2025-06-25
Checks: 6 1
Knit directory: casper_ss_ma/analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(12345) was run prior to running the
code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
| absolute | relative |
|---|---|
| /Volumes/scratch/DIMA/piva/casper_ss_ma/ | .. |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b7d70ab. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish or
wflow_git_commit). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .RData
Ignored: .Rhistory
Ignored: .Rproj.user/
Untracked files:
Untracked: .DS_Store
Untracked: analysis/.DS_Store
Untracked: analysis/02_degs_go_aneuploidy_median.Rmd
Untracked: analysis/03_degs_go_CD82expr_median.Rmd
Untracked: analysis/VennDiagram.2025-06-09_13-53-40.335615.log
Untracked: analysis/VennDiagram.2025-06-09_13-54-51.029086.log
Untracked: analysis/VennDiagram.2025-06-09_13-55-15.147126.log
Untracked: analysis/VennDiagram.2025-06-09_13-56-18.122749.log
Untracked: analysis/VennDiagram.2025-06-09_13-56-30.934079.log
Untracked: analysis/VennDiagram.2025-06-09_14-18-19.412377.log
Untracked: analysis/VennDiagram.2025-06-18_10-28-53.699452.log
Untracked: analysis/VennDiagram.2025-06-18_10-37-36.77178.log
Untracked: analysis/VennDiagram.2025-06-18_11-32-36.228427.log
Untracked: analysis/VennDiagram.2025-06-18_15-38-55.387683.log
Untracked: analysis/VennDiagram.2025-06-18_15-48-17.579371.log
Untracked: analysis/VennDiagram.2025-06-18_17-18-17.268774.log
Untracked: analysis/VennDiagram.2025-06-19_11-11-17.376961.log
Untracked: analysis/VennDiagram.2025-06-19_14-52-46.049026.log
Untracked: analysis/VennDiagram.2025-06-19_16-40-05.861139.log
Untracked: analysis/VennDiagram.2025-06-19_16-40-07.33202.log
Untracked: analysis/VennDiagram.2025-06-19_16-40-08.673023.log
Untracked: analysis/VennDiagram.2025-06-19_17-50-05.238063.log
Untracked: analysis/VennDiagram.2025-06-19_17-50-07.22979.log
Untracked: analysis/VennDiagram.2025-06-19_17-50-09.007028.log
Untracked: analysis/VennDiagram.2025-06-19_18-48-01.885712.log
Untracked: analysis/VennDiagram.2025-06-19_18-48-03.579702.log
Untracked: analysis/VennDiagram.2025-06-19_18-48-04.898695.log
Untracked: analysis/VennDiagram.2025-06-20_10-18-23.300456.log
Untracked: analysis/VennDiagram.2025-06-20_10-18-24.588109.log
Untracked: analysis/VennDiagram.2025-06-20_10-18-26.077856.log
Untracked: analysis/VennDiagram.2025-06-20_10-50-54.081682.log
Untracked: analysis/VennDiagram.2025-06-20_10-50-55.516535.log
Untracked: analysis/VennDiagram.2025-06-20_10-50-56.913582.log
Untracked: analysis/VennDiagram.2025-06-20_11-10-43.68944.log
Untracked: analysis/VennDiagram.2025-06-20_11-10-45.681514.log
Untracked: analysis/VennDiagram.2025-06-20_11-10-47.126222.log
Untracked: analysis/VennDiagram.2025-06-20_12-19-10.326514.log
Untracked: analysis/VennDiagram.2025-06-20_12-19-11.75991.log
Untracked: analysis/VennDiagram.2025-06-20_12-19-13.198666.log
Untracked: analysis/VennDiagram.2025-06-20_12-29-09.447741.log
Untracked: analysis/VennDiagram.2025-06-20_12-29-11.214146.log
Untracked: analysis/VennDiagram.2025-06-20_12-29-12.791818.log
Untracked: analysis/VennDiagram.2025-06-20_12-44-02.971891.log
Untracked: analysis/VennDiagram.2025-06-20_12-44-04.709094.log
Untracked: analysis/VennDiagram.2025-06-20_12-44-06.321173.log
Untracked: analysis/VennDiagram.2025-06-24_15-54-45.065538.log
Untracked: analysis/VennDiagram.2025-06-24_15-54-48.303942.log
Untracked: analysis/VennDiagram.2025-06-24_15-54-50.098014.log
Untracked: analysis/VennDiagram.2025-06-25_11-53-49.958809.log
Untracked: analysis/VennDiagram.2025-06-25_11-53-51.64026.log
Untracked: analysis/VennDiagram.2025-06-25_11-53-53.29465.log
Untracked: analysis/VennDiagram.2025-06-25_15-09-09.87969.log
Untracked: analysis/VennDiagram.2025-06-25_15-09-14.193409.log
Untracked: analysis/VennDiagram.2025-06-25_15-09-17.485413.log
Untracked: analysis/VennDiagram.2025-06-25_15-34-29.722117.log
Untracked: analysis/VennDiagram.2025-06-25_15-34-31.791802.log
Untracked: analysis/VennDiagram.2025-06-25_15-34-34.21193.log
Untracked: analysis/VennDiagram.2025-06-25_15-56-06.690484.log
Untracked: analysis/VennDiagram.2025-06-25_15-56-09.642419.log
Untracked: analysis/VennDiagram.2025-06-25_15-56-11.686973.log
Untracked: analysis/hsa04064.HLT-HighAS_vs_HLT-LowAS.png
Untracked: analysis/hsa04064.HLT-HighCD82_vs_HLT-LowCD82.png
Untracked: analysis/hsa04064.HRplus-HighAS_vs_HRplus-LowAS.png
Untracked: analysis/hsa04064.HRplus-HighCD82_vs_HRplus-LowCD82.png
Untracked: analysis/hsa04064.HRplus_vs_HLT.png
Untracked: analysis/hsa04064.TNBC-HighAS_vs_TNBC-LowAS.png
Untracked: analysis/hsa04064.TNBC-HighCD82_vs_TNBC-LowCD82.png
Untracked: analysis/hsa04064.TNBC_vs_HLT.png
Untracked: analysis/hsa04064.TNBC_vs_HRplus.png
Untracked: analysis/hsa04064.png
Untracked: analysis/hsa04064.xml
Untracked: code/
Untracked: data/
Untracked: degs_HRplus-HighAS_vs_HRplus-LowAS.csv
Untracked: output/
Unstaged changes:
Modified: analysis/00_casper_analysis.Rmd
Deleted: analysis/02_deconvolution.Rmd
Modified: analysis/04_deconvolution.Rmd
Modified: casper_ss_ma.Rproj
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/02_degs_go_aneuploidy.Rmd)
and HTML (docs/02_degs_go_aneuploidy.html) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote),
click on the hyperlinks in the table below to view the files as they
were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | b7d70ab | annamariapiva | 2025-06-25 | updated notebooks 02, 03 |
| html | 61d931d | annamariapiva | 2025-06-25 | Build site. |
| Rmd | 8bb180c | annamariapiva | 2025-06-25 | updated notebooks 01, 02, 03 and 04 |
| html | a039b3f | annamariapiva | 2025-06-20 | Build site. |
| Rmd | f0e862c | annamariapiva | 2025-06-20 | new reports |
The goal of this analysis is to identify which pathways are up- or down-regulated in samples with high or low levels of Aneuploidy Score (AS) computed using CaSPeR pipeline. For each condition (Healthy, HR+, and TNBC), patients are divided into high and low aneuploidy score groups. The following comparisons:
HR+ High-AS vs HR+ Low-AS
TNBC High-AS vs TNBC Low-AS
Healthy High-AS vs Healthy Low-AS
The analysis includes:
The input for the following analysis is:
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
The first steps to start the analysis in R is to load the packages required for the analysis, load the input data mentioned above and establish the thresholds for the analysis:
To classify samples into High and Low Aneuploidy Score groups, we examined the distribution of aneuploidy scores across all samples from the three conditions. The distribution appeared bimodal, suggesting the presence of two distinct populations. To separate these, we defined a cutoff at the local minimum between the two peaks.
In the distribution plots:
The blue line indicates the median aneuploidy score of the displayed samples.
The red line marks the cutoff point, corresponding to the local minimum used to define the High vs Low groups.

| Version | Author | Date |
|---|---|---|
| a039b3f | annamariapiva | 2025-06-20 |

| Version | Author | Date |
|---|---|---|
| a039b3f | annamariapiva | 2025-06-20 |

| Version | Author | Date |
|---|---|---|
| a039b3f | annamariapiva | 2025-06-20 |
Differential expression analysis is performed using a custom function, which accounts for batch effect. A batch effect occurs when non-biological factors, like laboratory conditions or instruments used, in an experiment cause changes in the data produced by the experiment. Lowly expressed genes are removed to reduce noise. Lowly expressed genes are here considered as:
Let’s have a look at PCA, and gene expression pattern across samples. The batch effect has been considered in the design, but has not been corrected for this plot.
Here is the PCA of selected sample from the first comparison.
Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.
Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.
Meaning of Colors


Further analysis is done through gene set enrichment analysis, which does not exclude genes based on logfc or adjusted p-value, as done previously. GSEA is performed separately on each subontology: biological processes (BP), cellular components (CC) and molecular functions (MF). The dot plot below shows the top 10 most enriched GO terms. The size of each dot correlates with the count of differentially expressed genes associated with each GO term. Furthermore, the color of each dot reflects the significance of the enrichment of the respective GO term, highlighting its relative importance.
To visualize gene expression changes on biological pathways, we used the pathview R package, which maps gene-level statistics (e.g., log2 fold-changes) onto KEGG pathway diagrams.
For each contrast in our differential expression analysis, we extracted significantly differentially expressed genes and passed their log2 fold-change values to pathview() to visualize the NF-kappa B signaling pathway (KEGG pathway ID “hsa04064”). Pathway visualizations highlight upregulated and downregulated genes in red and blue, respectively, based on log2 fold-change.
[1] "Note: 4590 of 14347 unique input IDs unmapped."
[1] "Note: 4590 of 14347 unique input IDs unmapped."
[1] "Note: 4590 of 14347 unique input IDs unmapped."
Here is the PCA of selected sample from the second comparison.
Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.
Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.
Meaning of Colors


Here is the PCA of selected sample from the third comparison.
Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.
Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.
Meaning of Colors



HRplus-HighAS_vs_HRplus-LowAS
HALLMARK_ADIPOGENESIS 0.3195648
HALLMARK_ALLOGRAFT_REJECTION NA
HALLMARK_ANDROGEN_RESPONSE -0.7965541
HALLMARK_ANGIOGENESIS -1.2871963
HALLMARK_APICAL_JUNCTION NA
HALLMARK_APICAL_SURFACE 0.5208629
TNBC-HighAS_vs_TNBC-LowAS HLT-HighAS_vs_HLT-LowAS
HALLMARK_ADIPOGENESIS 0.8142424 -1.2551732
HALLMARK_ALLOGRAFT_REJECTION 0.9522570 -2.0372233
HALLMARK_ANDROGEN_RESPONSE 0.5914140 -0.9797684
HALLMARK_ANGIOGENESIS 0.8446955 -1.2955316
HALLMARK_APICAL_JUNCTION 1.1562512 -1.0381009
HALLMARK_APICAL_SURFACE 0.8373958 0.7755034

| Version | Author | Date |
|---|---|---|
| a039b3f | annamariapiva | 2025-06-20 |
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.4.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Rome
tzcode source: internal
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] VennDiagram_1.7.3 futile.logger_1.4.3
[3] pathview_1.40.0 tibble_3.3.0
[5] fgsea_1.26.0 msigdbr_24.1.0
[7] gridExtra_2.3 dplyr_1.1.4
[9] clusterProfiler_4.8.2 plotly_4.10.4
[11] reshape_0.8.9 ggplot2_3.5.2
[13] gplots_3.2.0 RColorBrewer_1.1-3
[15] ComplexHeatmap_2.16.0 rtracklayer_1.60.1
[17] DESeq2_1.40.2 SummarizedExperiment_1.30.2
[19] Biobase_2.60.0 MatrixGenerics_1.12.3
[21] matrixStats_1.5.0 GenomicRanges_1.52.1
[23] GenomeInfoDb_1.36.4 IRanges_2.34.1
[25] S4Vectors_0.38.2 BiocGenerics_0.46.0
[27] DT_0.33
loaded via a namespace (and not attached):
[1] splines_4.3.1 later_1.4.2 BiocIO_1.10.0
[4] bitops_1.0-9 ggplotify_0.1.2 polyclip_1.10-7
[7] graph_1.78.0 XML_3.99-0.18 lifecycle_1.0.4
[10] doParallel_1.0.17 rprojroot_2.0.4 lattice_0.22-7
[13] MASS_7.3-60 crosstalk_1.2.1 magrittr_2.0.3
[16] sass_0.4.10 rmarkdown_2.29 jquerylib_0.1.4
[19] yaml_2.3.10 httpuv_1.6.16 cowplot_1.1.3
[22] DBI_1.2.3 abind_1.4-8 zlibbioc_1.46.0
[25] purrr_1.0.4 ggraph_2.2.1 RCurl_1.98-1.17
[28] yulab.utils_0.2.0 tweenr_2.0.3 git2r_0.36.2
[31] circlize_0.4.16 GenomeInfoDbData_1.2.10 enrichplot_1.20.0
[34] ggrepel_0.9.6 tidytree_0.4.6 codetools_0.2-20
[37] DelayedArray_0.26.7 DOSE_3.26.2 ggforce_0.4.2
[40] tidyselect_1.2.1 shape_1.4.6.1 aplot_0.2.5
[43] farver_2.1.2 viridis_0.6.5 GenomicAlignments_1.36.0
[46] jsonlite_2.0.0 GetoptLong_1.0.5 tidygraph_1.3.1
[49] iterators_1.0.14 foreach_1.5.2 tools_4.3.1
[52] treeio_1.24.3 Rcpp_1.0.14 glue_1.8.0
[55] xfun_0.52 qvalue_2.32.0 withr_3.0.2
[58] formatR_1.14 fastmap_1.2.0 caTools_1.18.3
[61] digest_0.6.37 R6_2.6.1 gridGraphics_0.5-1
[64] colorspace_2.1-1 GO.db_3.17.0 gtools_3.9.5
[67] RSQLite_2.4.1 tidyr_1.3.1 generics_0.1.4
[70] data.table_1.17.6 graphlayouts_1.2.2 httr_1.4.7
[73] htmlwidgets_1.6.4 S4Arrays_1.0.6 scatterpie_0.2.4
[76] whisker_0.4.1 pkgconfig_2.0.3 gtable_0.3.6
[79] blob_1.2.4 workflowr_1.7.1 XVector_0.40.0
[82] shadowtext_0.1.4 htmltools_0.5.8.1 clue_0.3-66
[85] scales_1.4.0 png_0.1-8 ggfun_0.1.8
[88] lambda.r_1.2.4 knitr_1.50 rstudioapi_0.17.1
[91] reshape2_1.4.4 rjson_0.2.23 nlme_3.1-168
[94] curl_6.3.0 org.Hs.eg.db_3.17.0 cachem_1.1.0
[97] GlobalOptions_0.1.2 stringr_1.5.1 KernSmooth_2.23-26
[100] parallel_4.3.1 HDO.db_0.99.1 AnnotationDbi_1.62.2
[103] restfulr_0.0.15 pillar_1.10.2 vctrs_0.6.5
[106] promises_1.3.3 cluster_2.1.8.1 Rgraphviz_2.44.0
[109] evaluate_1.0.4 KEGGgraph_1.60.0 cli_3.6.5
[112] locfit_1.5-9.12 compiler_4.3.1 futile.options_1.0.1
[115] Rsamtools_2.16.0 rlang_1.1.6 crayon_1.5.3
[118] labeling_0.4.3 plyr_1.8.9 fs_1.6.6
[121] stringi_1.8.7 viridisLite_0.4.2 BiocParallel_1.34.2
[124] assertthat_0.2.1 babelgene_22.9 Biostrings_2.68.1
[127] lazyeval_0.2.2 GOSemSim_2.26.1 Matrix_1.6-4
[130] patchwork_1.3.0 bit64_4.6.0-1 KEGGREST_1.40.1
[133] igraph_2.1.4 memoise_2.0.1 bslib_0.9.0
[136] ggtree_3.8.2 fastmatch_1.1-6 bit_4.6.0
[139] downloader_0.4.1 ape_5.8-1 gson_0.1.0